Web Page Recovery    09-30-2009

How To Recover Webpages

2 weeks ago I walk into my computer room and hear the dreaded hard drive click of death. The computer I use as my combination web server, domain controller is toast. (Yes, I know you shouldn’t run them on the same box, but its for a home domain so I’m not really worried.). I new this could happen but I was not to worried. It is rare you get to build a domain from scratch so I had never backed it up cause I figured I could always use the experience in setting another one up if it ever crashed. So far no problem. As far as rebuilding the web pages, I had a copy on my main computer that I back up every month-ish, so once again no major issue. Then I check the date on the copy on my main computer. Its 2 years old,,,,,,,,,,,,, ARGGH!!!! Now I have a problem. So here are some of my thoughts on how to recover web pages you might find interesting.

So how did this happen? For many years I would make changes to my web page by changing the files on my local computer. These file are in a folder that is in my data directory that gets backed up. After I looked at the changes and liked them, I would copy the files to the web page server. So far, so good. About 2 years I started playing with ASP files. These require an IIS server running on the computer for them to work and since I didn’t want to install IIS on my main computer I stared making the changes to the pages directly on the production server. These were only test files so still no problems. But then I must have gotten used to that and started making all changes directly on the web server. Now we have a problem.

So, 2 years of updates are gone. This is about 25 pages with text and pictures. What now?

Internet Cache Files

Everyone knows that your computer keeps a copy of every web page you visit in the internet cache folder. You see it in TV shows and articles about how to clear your folders so people cannot see where you have been. Reality is less useful. There is a history list, and yes it does list every web page you have ever visited unless you clear it, it will record for years. And, yes there is a cached folder with many web pages and images stored as a local copy the computer uses to fetch data from instead of getting data from frequently visited sites every time. But as far as a rhyme or reason as to WHICH files it keeps or how long it keeps them, you would need someone from Microsoft cause it makes no sense. I have 4 computers that I use regularly and I go to different pages on my website from each of them frequently to make sure everything looks right and to show people pictures and such. After scanning the cache of every one of these computers I found a total of 4 web pages and 9 pictures from my site. Some of the pages were ones I had updated in the past month, others have not been looked at in a year or more. Totally random.

Internet History Archive. There is a website who’s primary purpose is to store copies of old web pages for history. I think the original purpose was for News sites and other major sites as a research tool, but they branched out into saving a ton of stuff. It looks like they had actually saved copies of my webpage, but only from 2004-2005. Interesting but not useful for my current problem.


Google Cache
They should go ahead and rename this company “The Matrix”. I have seen from other Google searches that they have a linke to a cached entry of may web pages in case the site is down or gone. But I had no idea how many sites they had indexed and copied. I was able to see and recover copies of 80% of my web pages but entering my site as a search term. They only had the text and not the pictures, but I keep copies of my trip pictures in a safe location so I just had to resize them for web use again. At least I was able to recover the comments and text.


Odd Items
While doing this I found a couple of odd items

1. There we some web pages of mine that Google did not save that were some of my most frequently viewed pages. Particularly the pages of the house remodeling. I know people looked at these pages because people mentioned it to me, but Google did not save a single page of the Three.

2. Google can search where a page is, and Google can save a copy. It does not search it’s copies of pages. Lol. It did a search for TimJenkinsWeb.com and it found 50 pages of my 300 or so. There was one page in particular I really wanted to recover so on a whim I did a search for the page with the full path like TimJenkinsWeb\trip\niftyplace and this time it said, yes I have a copy of that page.

3. A couple years ago I bought a second domain name and started sending everyone to TimJenkinsWeb instead of JinksWeb (I like JinksWeb but no two people spell jinks the same way and everyone forgot it). Because of this Google had indexed some pages under one name and some under the other. Once again, no rhyme or reason.